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ABSTRACT 



Statistical mechanics is used to study unrealizable generalization in two large feed-forward 
neural networks with binary weights and output, a perceptron and a tree committee machine. 
The student is trained by a teacher being larger, i.e. having more units than the student. It is 
shown that this is the same as using training data corrupted by Gaussian noise. Each machine is 
considered in the high temperature limit and in the replica symmetric approximation as well as 
for one step of replica symmetry breaking. For the perceptron a phase transition is found for low 
noise. However the transition is not to optimal learning. If the noise is increased the transition 
disappears. In both cases e g will approach optimal performance with a (In a/a) k decay for large a. 
For the tree committee machine noise in the input layer is studied, as well as noise in the hidden 
layer. If there is no noise in the input layer there is, in the case of one step of repl! ica symmetry 
breaking, a phase tra nsition to optimal learning at some finite a for all levels of noise in the hidden 
layer. When noise is added to the input layer the generalization behavior is similar to that of the 
perceptron. For one step of replica symmetry breaking, in the realizable limit, the values of the 
spinodal points found in this paper disagree with previously reported estimates 0,0. Here the 
value a sp = 2.79 is found for the tree committee machine and a sp = 1.67 for the perceptron. 

PACS: 87.10, 02.50, 05.20, 64.60C 
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1. Introduction 

A Feed-forward neural network can be used to estimate an unknown rule from random 
examples Jl by adaption of its weights. Using methods from statistical mechanics of disor- 
dered systems [|J the performance of a student network trained on examples obtained from 
a teacher network of the same architecture has been studied (for a review see f5[). In this 
case the rule is said to be realizable since it is possible for the student to develop the same 
weights as the teacher. 

One way to construct an unrealizable rule is to allow for a teacher that is larger (more 
units) than the student. This will be shown to be equivalent to adding Gaussian noise to 
the training set. The noisy data scenario has been investigated for networks with continuous 
weights @ • [|7| ■ @ • In the limit where the teacher is infinitely larger than the student (large 
noise limit) the only thing the student can do is to learn each example by heart, and in this 
limit the problem reduces to that of storage capacity. 

In this paper the generalization behavior of two different types of binary neural networks 
with binary weights is studied, a perceptron (section 2) and a tree committee machine 
(section 3), in the limit where the number of units is large. The rule is defined by a teacher 
of the same type but having more units than the student, making the task unrealizable. 

The training of the student, having N units, is based on aN examples obtained by picking 
inputs £ M and assigning outputs r M as given by the teacher. With <t m being the /xth output 
of the student, a training energy E = 0(—<j^t^) is defined, which leads to a probability 
density with Boltzmann weight e~^ E , where j3 = 1/T is the inverse temperature. First the 
high temperature limit is considered for each type of network, the perceptron in section 
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(2.1) and the tree committee machine in section (3.1). Then, in sections (2.2) and (3.2), the 
replica trick is used, assuming replica symmetry (RS), to study the average over all training 
sets of the free energy, (3f. In sections (2.3) and (3.3) the corrections given by one step of 
replica symmetry breaking (RSB) are discussed. Since, in the noiseless limit, the value of the 
spinodal point found in the RSB-case disagrees with previously reported estimates |J,[@, 
some time is spent on the saddle point equations in appendix A. Finally, in appendix B, the 
procedure for finding the asymptotic generalization behavior for large a is given. 



2. A Large Binary Perceptron with Ising weights. 

Let the student and the teacher have N and M input units respectively with iV < M. 
Presented an input, s, the teacher evaluates, r(s) = sga(v ■ s), while the student computes, 
cr(so) = sgn(w ■ so), given the input s . Here s and v are elements of 1Z M , while vectors 
having a zero subscript are elements of 1Z N . When the student is presented the same input 
vector, £, as the teacher, it only considers the N first components, £o- Thus the target rule 
will be, 



sgn 



1 N 1 



M 



M 

'iV 



3=1 



M 



sgn 

sgn (y -io + V 



E v & 



(2.1) 

(2.2) 
(2.3) 



where Vq is constructed from the first N components of v. Effectively this means that the 
student will be given the task r'(£o) = sgn^o • £o) with noise on the input vector Co = + 
and/or on the weight vector J = #0 + ^o- Since r] is constructed from independent Gaussian 



random variables, Vj and £ 3 - (j = N + 1, ...,M) with unit variance, 77 will also be Gaussian 
with variance, 

« - (^(^J +1 ^) 2 ) 

1 -7 2 

= • < 2 ' 5 > 

where 7 = N/M. 7 has the simple interpretation of the relative size of the student to the 
teacher. If 7 = 1 the student and the teacher are of the same size, i.e. there is no noise. If 
7 = the teacher is infinitely larger than the student, i.e. the data will be completely noisy. 

The generalization error, e g , obtained by taking the average of 6(—<jt) over normal dis- 
tributed inputs, Sj (j = 1, ...,M), is 

e g = — arccos(7-R) , (2.6) 

7T 

where R is the overlap between w and vq. For R — 1 we obtain the optimal value, e opt) of 

First the high temperature limit is considered. Then by using the replica method, the RS 
approximation is studied, and finally the corrections given by one-step RSB are discussed. 

2.1. High Temperature Limit 

In previous work M the high temperature limit has proven to be interesting since it is 
both computationally easy and gives the general behavior of learning. It is defined so that 
both a and T approach infinity while a(3 remains constant. The free energy is simply 

M = f arccoshfl) + ^H^) + ^M^) . (2.1.1) 



The qualitative behavior of the learning curves can be divided into two types depending on 
whether the noise level is above or below a particular value 70. For 70 < 7 < 1 there is, as 
in the realizable case, a range {a(5) sp \ < a(3 < (a/3) sp2 , for which j3f has two minima. In 
between {a(3) sp i and {ot(5) sp 2 there is a transition point {af3) tr at which the global properties 
of the minima change. In contrast to the noiseless case, {af3) sp \ > and thus for < af3 < 
{af3) sp i there is only one minimum. The minimum persisting also for a(3 > {a(3) sp 2 is close 
to R = 1 and approaches optimal performance as a/3 increase. Note that in contrast to the 
realizable case there is no solution at R — 1. Typically (a/3) tr , (ct/3) sp i and (a[3) sp 2 increase 
with decreasing 7 and merge at 7 = 70. This is illustrated in figure (??)• 

The two minima of (3f must be separated by a maximum, implying that = at the 
spinodal points. Using the saddle point equation, 

n >r I i n (I±iT\ =arctanh( J R) , (2.1.2) 
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g{R,i) ■ (2.1.3) 



j 2 R(l-R 2 )_ 

For 7 = 1, ( |2.1.3|) has one solution, R sp = 0.83 resulting in (aj3) sp = 2.08 in agreement with 
|Ij]. In the region 70 < 7 < 1 ( |2.1.3| ) has two solutions giving (af3) sp i and (a[3) sp 2. At 7 = 70 



the two solutions merge and the two curves R and g(R,j) are tangent to each other. Thus 
7o can be found by solving, 

^(^,7o) = 1 , (2.1.4) 
g(R, l0 ) = R , (2.1.5) 



giving 70 = 0.965. For 7 < 70, (3f has only one minimum (for all af3) which moves towards 
R = 1 as a(3 approaches infinity. Note that fairly small amounts of noise will change the 
qualitative behavior from phase transition to no transition. 

In weight space this behavior can be understood as follows. In the noiseless case there 
are, for small a, two regions in weight space corresponding to the minima of /3f, one with 
poor generalization and one with good. If a is small enough the "poor" region has the lowest 
free energy. As a increase the "poor" region moves towards the "good" and for a > at r the 
"good" region has the lowest free energy. Since for a = at r the "poor" and "good" regions 
are separated, there will be a phase transition. 

If noise is added, the sizes of these regions will increase. For low a there is only one 
region in weight space corresponding to a minimum of f3f . It will have poor generalization. 
At a = a sp i another region corresponding to a free energy minimum appears. This region 
gives better generalization. Again as a increase the "poor" region moves towards the "good" 
and for a > a tr the "good" region has the lowest free energy. Since for a = a tr the "poor" 
and "good" regions are separated there will be a phase transition. If the noise is increased 
the "poor" region is so large that when the "good" region is created it will overlap with the 
"poor" . Thus there is only one region, moving towards better generalization and there is no 
phase transition. 

2.2. Replica Symmetric Theory 

Using the same methods as in PJ the RS approximation to the free energy is obtained, 
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The saddle point equations generated by the extremal condition in (|2.2.1 ) is given in appendix 
A. Here q is the typical overlap between two different Wq. R, 7 and a have the interpretation 
given above. Using the saddle point equations we can, given 7 and (3, eliminate the auxiliary 
variables R and q, and find the dependence of R (and e g ) on a. 

First consider the zero temperature case. This corresponds to only allowing students that 
answers all questions correctly. If 7 < 1 the training data is noisy and there is a maximum 
size of the training set a c N beyond which no student can perform optimally. a c {l) and 
i? c (7) are plotted in figure (??). For 7 = the known result of Gardner is reproduced. 
Note that the curves do not give a c — > 00 as 7 — > 1. However this may not be expected since 
the curves only give correct predictions for states that are stationary points of /3fns and in 
the realizable case the state R = 1 is not stationary as was shown in |l| . For 7 = 1 both the 
transition and the spinodal points agrees with the values found in [[[[]. A learning curve for 
7 = 0.99 is shown in figure (??)• 

At T > the learning behavior is the same as for the high temperature limit but with a 
different 70, depending on T, and with e g and q having the asymptotic form, 
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for large a. For details on how to compute the asymptotic form see appendix B. For some 



range of 7, 7,4 < 7 < 1, there is a phase transition already at zero temperature while in 
a range 7_b < 7 < 7a there is no transition at low temperature. As the temperature is 
increased a transition develops which is illustrated in figure (??) for 7 = 0.99. Finally when 
7 < 1b there seems to be no phase transition no matter how high the temperature. 
2.3. Replica Symmetry Breaking 

In the RS approximation the entropy will always turn negative at some finite a and 
therefore a region in aT-space for which the system exhibits replica symmetry breaking 
(RSB) is expected, see figure (??). Analogous to |l[ one step of RSB gives, 
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(2.3.1) 
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where the extremum is taken over R, R, g , go, gi, gi, and m. As in [|| the limit q± — > 1, 
gi — > 00 is considered, implying that the stationary points of /ass are given by the stationary 
points of fas having zero entropy (see appendix A for details). 

The learning behavior is analogous to the high temperature limit but with 70 = 0.995. 
In appendix B the asymptotic form of e g and q is computed, 
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When 0.999 < 7 < 1, a tr occurs in between a sp i and a sp 2 while for 0.996 < 7 < 0.998, 
a tr = a sp i, i.e. the state with better generalization is stable as soon as it appears. 

For the case 7 = 0.05 the critical capacity, a c = 0.83 (q c = 0.56) is found which is 
compatible with the known results for 7 = jTO]. Some values are given in table (??) and 
some typical learning curves are given in figure (??). 

In the noiseless limit the result, a sp = 1.67, correcting a previous result by Seung et. al. 
[|l[ (a sp = 1.63). The reason for this is given in appendix A. 

It is also interesting to compare with some recently reported upper bounds for the Ising 



perceptron [llj]. In this article the asymptotical behavior was found to be the same as 



( p.3.5|) . The authors found that the phase transition disappeared below 7 = 0.998 thus not 



only predicting the correct qualitative behavior but also giving a tight quantitative bound 
on 7 . Also, at 7 = 0.998, they found a tr = 2.6136 whereas a tr = 1.83 is obtained at 70 
given above and using the replica method. 



3. A Binary Committee Machine with Ising weights. 

Let the student and the teacher have N (K) and M (L) input (hidden) units respectively, 
with iV < M, and K < L. We can think of the student (teacher) as a committee of binary 
perceptrons each of which has N/ K (Mj L) input units. As the Ith perceptron in the teacher 
is presented an input si the teacher evaluates, 



t(si, ...,s L ) = sgn 
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(3.1) 
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while the student computes, 

<r{si ',...,s K K ') =sgn 
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as the fcth perceptron in the student is given the input sjf\ Here si and vi are elements of 
j^m/l w h ereas a zero superscript indicates that the vector is an element of TZ N / K . When 

— # 

the student is presented the same set of input vectors, (/ = 1, L), as the teacher it only 
considers the first N/K components of the first K vectors in that set, (I = 1,...,K). 
Analogous to the simple perceptron we find that this is equivalent to learning a noisy target 
rule, 
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where rj and rjk are independent Gaussian random variables with variance, 
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7 = yK/L is simply the relative number of hidden units of the student to the teacher 
while 5 = \JnL/(KM) is the relative number of input units of a perceptron in the student 
committee to a perceptron in the teacher committee. Thus 7 quantifies the noise in the 
hidden layer and 5 the noise in the input layer. If 7 = 5 — 1 the realizable case is recovered. 
Using these parameters the generalization error is found, 



€ g = — arccos [R e ] , 

71 



(3.6) 



where the effective order parameter is given by R e = ^7 arcsin(<5i?) and R is the typical 
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overlap between w k and v\, . Here, analogous to Schwarze and Hertz [|], it is assumed that 
R is independent of the hidden unit index k. 

As for the perceptron case the high temperature limit is considered first. Then by using 
the replica method, the RS approximation is studied and finally the corrections given by one 
step of RSB are discussed. 

3.1. High Temperature Limit. 

Taking the limits T — > oo and a — * oo while keeping a(3 fixed the free energy is found, 

Pf = -f arccos^) + hi(^-) + H—j-)- (3-1-1) 

If the noise level is low enough, there exists two spinodal points, ot sp i and a sp 2, with a phase 
transition in between. In contrast to the perceptron one find that if there is no input noise 
(5 = 1) there is a phase transition to optimal performance at a finite a for all values of 7. 
Given a 7 and that £0(7) < S a transition to a state approaching optimal learning in the 
large a limit is found. For 5 < £0(7) the transition vanishes and e g approaches e opt as a 
tends to infinity. Especially if 5 > 5a = ^o(O) there is always a phase transition while for 
5 < 5b = 5q(1) there is no phase transition independent of the hidden noise. By the same 
procedure as in section (2.1) one find 5a = 0.965, 5b = 0.924 and $0(7) as shown in figure 
(??). Also here a sp i, ot tr and a sp2 increase with increasing noise. 



3.2. Replica Symmetric Theory. 
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Analogous to Schwarze and Hertz H] the RS estimate to the free energy is found, 
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where R e is given above and q e 



arcsing. The value of q is the typical correlation 



between two which are assumed to be independent of the perceptron index k. The 
interpretations of R, 7, S and a are as given above. By using the equations generated by the 
extremal condition in (|3.2.1|) to eliminate R and q we can find the dependence of R (and e 9 ) 
on a given 7, 5 and (3. 

For T = one should, as for the perceptron, find a critical capacity, a c , beyond which 
the student can not perform optimally on the training set. However, this is not the case 
implying that the RS- approximation is bad. In the realizable case the values of both the 
transition and the spinodal point agree with 

At T > the behavior is much the same as in the high temperature limit with the 
exception that for S = 1 the transition is not to an optimal state but to a state approaching 
optimal learning as a tends to infinity. The asymptotical form of e g and q for large a can be 
found for 5 = 1, 7 < 1, 



e g -e op t = £1(7, /?) 
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and for S < 1, 7 < 1, 

/lna\ 1/3 

e g -e opt = B 3 ( T , <$,/?) (— J , (3-2.7) 

/, \4/3 

1-g = £4(7, 5, 0) f-^j . (3.2.8) 

The asymptotic behavior can be found by the same method used in appendix B. As for 
the perceptron there is a range of noise levels for which there is no phase transition at low 
temperature but one is developed as the temperature is increased. One such example (7 = 1, 
5 = 0.99) is used in the phase diagram (??). If the noise is increased above some value there 
seems to be no phase transition no matter how high the temperature. 



3.3. Replica Symmetric Breaking. 

As was said in the previous section the RS-approximation fails in predicting a critical 
capacity. Also, the entropy will turn negative at some finite a and thus RSB is expected. In 
figure (??) a phase diagram for 7 = 1, 8 = 0.99 shows the RSB region. For one step of RSB, 
in the limit g x — > 1, q\ — > 00, the free energy is, 
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For reasons analogous to those given in [111] for the perceptron the stationary points of J'rsb 
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are given by the stationary points of fns having zero entropy. 

In contrast to the RS-case, but analogous to the high temperature limit, a transition to 
optimal learning is found for all 7 if 5 = 1. Using the same notation as in section (3.1) the 
values of 5a and 8b are 0.9995 and 0.9847 respectively, and $0(7) is given in figure (??). 

For the case 7 = 5 = 0.05 the critical capacity, a c = 0.95 (q c = 0.31) is found which is 
compatible with known results for 7 = 5 = Typically a sp i, a tr and a sp 2 increase with 
increasing unrealizability until 5 = 5q(j) where a sp i = a tr = a sp 2- Some values of a sp i, a tr 
and a sp 2 are given in in table (??), and some typical learning curves are given in figures (??) 
and (??). 

As a — > 00 the asymptotic forms of e g and q are, 

e g -e opt = A^S) (^—\ , (3.3.5) 



/1 \ 4/3 

1-q = A 2 (j,5) [^j . (3.3.G) 



Appendix B gives details of how to compute the asymptotic behavior, using the perceptron 
as an example. In the realizable limit (5 = 7 = 1) the result a sp = 2.79, correcting the value 
found in [0 (a sp = 2.58). The reason for the correction is given in appendix A, where the 
perceptron is used as an example. 

4. Summary. 

In summary we have studied unrealizable learning in two large feed-forward neural net- 
work, a perceptron and a tree committee machine within the replica symmetric ansatz as 
well as for one step of replica symmetry breaking. The average generalization error has been 
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calculated function of the load parameter a. 

For the perceptron it was shown that using a noisy training set results in a generalization 
error approaching optimal learning with increasing a according to a power law of (lna/a) k 
with k = 2 in the RSB-case. If the noise is low enough there is a phase transition at some 
finite a to a state which is close to R = 1. Increasing the noise makes the transition go away. 

For the tree committee machine a similar generalization behavior was found, the main 
difference being that there is always a transition to optimal learning at some finite a if there 
is no noise in the input layer. Typically, noise in the input layer gives worse generalization 
behavior than noise in the hidden layer. For one step of RSB and with noise in the input 
layer as well as in the hidden layer the asymptotic form of e g was found to be (lna/a) fc with 
k = 2/3. 

In the realizable cases the values of a sp correct previously reported results [IJ],@, for the 
RSB spinodal point in the two machines. Here a sp = 1.67 was found for the perceptron and 
a sp = 2.79 for the tree committee machine. 

I thank J. Hertz for his valuable advice and direction and R. Urbanczik for many use- 
ful discussions. Also, I would like to thank H. Schwarze for sharing the code written in 
connection to ref. which made it possible to sort out the disagreement on the spinodal 
points. 

A. The Saddle Point Equations 



In the limit q± — > 1, q± — ► oo the one step RSB free energy ( [2.3. 1| ), of the perceptron, is 
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related to the RS-estimate (pXIQ thru 



f R SB(R,R,qo,qo,m,p) = —f R s(R,mR,qo,m 2 q ,pm) 

m 



(a.i; 



Stationarity with respect to R, R, go and go results in the relations qo(TnsB,fn, ot) = 
qRs(TRSB/m, at) and R(Trsb, m, ot) = Rrs(Trsb /m, ot) while stationarity with respect to 
m gives srs(Trsb /tti, a) = where srs is the RS entropy. Thus one can find the stationary 
points of Jrsb by finding stationary points of fas at a temperature Trs = Trsb/iti for which 
the entropy is zero. The saddle point equations generated by ( [2.2. 1|) are 
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(A.2) 
(A.3) 
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with up = l/(e^ — 1), v = Myg/ (1 — g) and y = tJ (q — 7 2 i? 2 )/ (1 — g). Using (|A.2|) and 
( |A.3| ) to eliminate q and R in (|A.4j) an ( |A.5| ) gives a system of two non-linear equations 



q = a h(R, q) 
R = a g(R,q) 



(A.6) 
(A.7) 



At this point we could try to solve for q and R given a. However since e g is a many- valued 
function of a it is more economical to eliminate a. This will give the equation, 



qg(R,q)=Rh(R,q) 



(A.8) 
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which can be solved for q given R. a can be evaluated using ( |A.6| ) or ( |A.7| ). The advantage 



is that e g is a single valued function of R. In the RSB-case this will be helpful since more 
than one solution for each a has to be considered as we show below. 

Once a stationary point has been found its second order properties has to be checked 
by computing the determinant of the Hessian matrix, H. Assume that the correct sign of 
det if, at R > 0, is given by the sign at R = 0. As R is increased, the sign of det if will 
change first at R sp 2 and then again at R sp i. Note that R sp 2 < R sp i whereas a sp i < a sp 2- In 
the regime R sp 2 < R < R sp i, Pf has a stationary point but it has the incorrect curvature. 

Even though the RSB-case is solved by using the RS-equations the determinant of the 
Hessian matrix of flfasB, det Hrsb, has to be used to determine the second order properties, 
det Hrsb consists of the second derivatives of PfnsB with respect to R, q, R, q and m whereas 
det Hrs is computed from the second derivatives of fifus with respect to R, q, R and q. Using 
det Hrs = as the criterion to determine the spinodal point (at 7 = 1) would result in the 
values of a sp as given in and M. Moreover, insisting on this RS-criterion will, for some 7, 
result in regions of a where no solution exist. Thus this procedure fails in a disastrous way. 
However the correct condition, det H RS b = 0, will cure this problem and give a sp as given 
in this paper. 

In the RSB-case det Hrs will have the wrong sign close to a sp . This will correspond to 
points on e g not considered in the RS-case since there exists another solution (with s > 0) 
at the same a but with det Hrs having the correct sign. This is illustrated (for 7 = 1) in 
figure (??)• 

When the RS-equations are used to solve the RSB-case it is not possible to find the value 
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of m (only Trs = Trsb/™>)- Since det Hrsb depends on m it can not be computed. However 
it is possible to show that, 



det H RS b(R, R, go, go, m) = —F(R, R, q , q ) 

m 



(A.9) 



and since < m < 1, det Hrsb has the same sign as F. 
B. Asymptotics 

For large a the saddle point equations ( |A.2|) implies that q, R are close to 1 and q, R are 
large. From this the asymptotic form of the free energy ( [2.2.1 ) for non-zero temperatures 
can be found, 



PfRS 

G s 
G r 
H(x) 

m 



extr 



R,R,q,q 



G r (R, q, a, 7, (3) + G S (R, R, q, q) 



a(5 



■-a 



arctan 



7T 



y/q ~ gg 
7# 



a[0Q3) + 0(-/3)] 
^/2T 



1-g 



In 



l + {e p - l)H(w) 



(B.l) 
(B.2) 
(B.3) 
(B.4) 
(B.5) 



If 7 < 1 (|B.1|) generates the saddle point equations, 



l-R 

1-q 

Q 
R 



'2 q f i? 2N 
' — 4 / exp 

7r^ eXP l 2q 



■Ky/l - 7 2 

a/^7 
71VI - 7 2 



+ 



(B.6) 

(B.7) 
(B.8) 
(B.9) 



The first two of these can be combined into 



l-q R 
Using QB^) and (|Bg)-(g3g) results in, 



1 ~ R * (B.10) 



1-q ~ (B.ll) 
;i-i?) 3/2 ~ ^exp(-aA 2 (l- J R)) , (B.12) 



where A 2 depend only on f3 and 7 and where ~ means proportional to in the asymptotic 
limit of large a. In order to solve ( |B.12|) the ansatz, 

1 - R(a) = + 5(a) , (B.13) 

a 

is made. For consistency it is important to check that 5(a) is of lower order than ln(a)/a. 
Combining flB.12| ) with the ansatz ( B.13|) and by choosing Aq = 1/A% gives the solution, 



e opt ~l-i?~^, (B.14) 
a 



Also 5(a) ~ ln[Zra(74 \na) /A^ 3 ]/a is found and thus 5(a) is of lower order. The asymptotic 



form of 1 — q is now easily found using ( |B.14| ) and ( |B.11| ). 



In the RSB-case the temperature is given by the zero entropy condition and can not 
be regarded as an arbitrary constant. Thus (3 is a function of a and combining the saddle 
point equations ( B.6| )- (|B.1Q ) with the asymptotic form of the zero entropy condition, q ~ 



1 -q) 1 / 2 , gives, 

1-q ~ (3\ (B.15) 
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l-R ~ 1 , (B.16) 

/? 5 / 2 ~ -^exp(-.B 2 a/3) , (B.17) 
Ja 



(B.18) 

where £?2 only depend on 7. Again an ansatz, [3(a) = B \n(a)/a + 5(a) is made which 
together with B = 2/B 2 gives the solution, 

(3(a) ~ — . (B.19) 
a 



The asymptotic form of R, q and e g is now found from ( B.15 ) and ( B.16 ) giving 



e g - e opt ~ 1 - R ~ f ^) , (B.20) 

For the tree committee machine the asymptotic forms of R, q and e g can be found by the 
same procedure but using the asymptotic form of the free energy ( ft.2.1| ) . 
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